Information Retrieval Using Label Propagation Based Ranking
نویسندگان
چکیده
The IR group participated in the crosslanguage retrieval task (CLIR) at the sixth NTCIR workshop (NTCIR 6). In this paper, we describe our approach on Chinese Single Language Information Retrieval (SLIR) task and English-Chinese Bilingual CLIR task (BLIR). We use both bi-grams and single Chinese characters as index units and use OKAPI BM25 as retrieval model. The initial retrieved documents are reranked before they are used to do standard query expansion. Our document re-ranking method is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo labeled documents from the ranking list of the initial retrieval. For pseudo relevant documents, we determine a cluster of documents from the top ones via cluster validation-based k-means clustering; for pseudo irrelevant ones, we pick a set of documents from the bottom ones. Then the ranking of the documents can be conducted via label propagation. For Chinese SLIR task, experiences show our method achieves 0.3097, 0.4013 mean average precision on T-only run (Title based) at rigid, relax relevant judgment and 0.3136, 0.4071 mean average precision on D-only run (short description based) at rigid, relax relevant judgment. For English-Chinese BLIR task, experiences show our method achieves 0.2013, 0.2931 mean average precision on T-only run at rigid, relax relevant judgment and 0.1911, 0.2804 mean average precision on D-only run at rigid, relax relevant judgment.
منابع مشابه
On the Robustness of Document Re-Ranking Techniques: A Comparison of Label Propagation, KNN, and Relevance Feedback
This paper describes our work at the sixth NTCIR workshop on the subtask of C-C single language information retrieval. We compared label propagation (LP), K-nearest neighboring (KNN), and relevance feedback (RF) for document re-ranking and found that RF is a more robust technique for performance improvement, while LP and KNN are sensitive to the choice and the number of relevant documents for s...
متن کاملMarkov Logic Sets: Towards Lifted Information Retrieval Using PageRank and Label Propagation
Inspired by “Google Sets” and Bayesian sets, we consider the problem of retrieving complex objects and relations among them, i.e., ground atoms from a logical concept, given a query consisting of a few atoms from that concept. We formulate this as a within-network relational learning problem using few labels only and describe an algorithm that ranks atoms using a score based on random walks wit...
متن کاملVideo semantic analysis based on structure-sensitive anisotropic manifold ranking
As a major family of semi-supervised learning (SSL), graph-based SSL has recently attracted considerable interest in the machine learning community along with application areas such as video semantic analysis. In this paper, we analyze the connections between graph-based SSL and partial differential equation(PDE) based diffusion. From the viewpoint of PDE-based diffusion, the label propagation ...
متن کاملA Selective Sampling Strategy for Label Ranking
Wepropose a novel active learning strategy based on the compression framework of [9] for label ranking functions which, given an input instance, predict a total order over a predefined set of alternatives. Our approach is theoretically motivated by an extension to ranking and active learning of Kääriäinen’s generalization bounds using unlabeled data [7], initially developed in the context of cl...
متن کاملData fusion and label weighting for image retrieval based on spatio-conceptual information
Using as experimental platform an image retrieval method based on a spatio-conceptual representation of images, in this paper we investigate two main concerns on annotationbased image retrieval: label weighting and data fusion. On the one hand, we analyze the influence of different weighting schemes on the quality of the retrieval performance, and, on the other hand, we study the application of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007